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© Cache miss prediction method and apparatus. 

(§) In a data processing system which employs a cache memory feature, a method and exemplary special 
purpose apparatus for practicing the method are disclosed to lower the cache miss ratio for called operands. 
Recent cache misses, are stored in a first in, first put miss stack-, and the stored addresses are searched for 
displacement patterns thereamong. Any detected pattern is then employed to predict a succeeding cache miss; 
by prefetching from main memory the signal identified by the predictive address; The apparatus for performing 
this task is preferably hard wired for speed purposes and includes subtraction circuits for evaluating variously 
displaced addresses in the miss stack and comparator circuits for determining if the outputs from at leiast two 
subtraction circuits are the same indicating a pattern yielding information which can be combined with an 
address in the stack to develop a predictive address; 
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CACHE MISS PREDICTION METHOD AND APPARATUS 



Field of the Invention 
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Thisinvenftpn relates to the art of data processing systems which include, a cache memory feature and. 
niore .piih^ -to a method and apparatus for predicting memory cache misses for operand calls and 
u*ng^ ^ mformatlon to transfer data from a main memory to cache memory to thereby lower the cacSe 

Background of the Invention 

.„ JO* rt « m Ployiri9 a high speed cache memory WteTOediatevajprbcessbr and a main memory 

to hold a dynamic subset of the information In the main memory in order to speed up system operation is 
"IS? h6idsa ^caiiy variable a^rfH* mSSS 

tSS&.'SS!!**'? SUCh ,S 8 «^ *» «*» «* foments will Se 

d3t ! reqU ! red by *»' P f6c8ssor W** operations. If there is a cache -hit" on a 

SSFF^T^*-?*^ * e P roce ^^ ■ memory had to be 

SS?,k - . *? ?ame ,nformat,on - Consequently, in many high ^performance data processing 

S2 ffw ^, mlSS " rati0 ". iS ° f ^ m ^«*« Pn the system^xecution rate, ana S 
therefore be kept as low as possible. 

b^Snn!l C f ,e r0 1 man mernor / a* any given instant There are several techniques for selecting 
n^L'f^ h t9mm ' n ^.and the more or less linear J of instructions i° 

Program-n^ renders these techniques statistically effective. However, the selection of operand information 

tS^SS^T^ mem0iy 31 8 ° iVen ,nS,8nt *"» mobh '^ effective and hS beeh geTeraJly 
S 1 9 J" e ° r m0fe < sonfi 8 u( W b,ocks »du««8 a cache miss address. This approach only 
shghtiy lowers the cache miss ratio and is also an Ineffective use of cache capacity 

^J^o^jf I"* - understand that it would be highly deairable to provide means for 

£22 ^^ ,l ^]f? p ^■* , !^ a c^e memory in iM&*^«i*J^ 
lower the cache m,ss ratio, and It is to that end that the present invention is directed 
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Summary of the Invention 

Brieny^the^ and other objects of the invention are achieved by special purpose apparatus in the cache 
Sffi"E2SJ reC6nt ^ 9 miSse ^ marches for patterns therein.;!^ deJLd pattern Sen 
^fprc^S^^'" 9 C3Che ** b/ — —cry the bfSc donning 

Description of the Drawing , 

nnrt ? 6 J"?"* ma ? e r , Qf "'f invemion is Particularly pointed out and distinctly claimed in the concluding 
portion of the spec.ficauon. The invention, how ver. both as- to, organization and method of operation, may 
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b st be understood by reference to the following description taken In conjunction with the subjoined claims 

and the* accompanying drawing of which: 

FIG. 1 is a generalized block diagram of a typical data, processing system employing a cache 

memory arid therefore constituting an exemplary environment for practicing the invention; 
s FIG. 2 is a flow diagram illustrating, in simplified form, the sequence, of operations by which the 

invention isr practiced; 

FIG. 3 is a logic diagram of a simple exemplary embodiment of the invention; and 
RG. 4 is a logic diagram of a more powerful exemplary embodiment of the invention. 

19 

Detailed Description of the Invention 

Referring now to FIG. 1, there is shown a high level block diagram for a data processing system 
incorporating a cache memory feature. Those skilled in-the art will appreciate that this block diagram is only 

75 exemplary and that many variations on it are employed in practice. Its function is merely to provide * a 
context for discussing the subject invention. Thus, the illustrative data processing system includes a main 
memory unit 13 which stores the data signal groups (he., information words, including instructions and 
operands) required by a central processing unit 14 to execute the desired procedures. Signal 'groups with 
an enhanced probability for requirement by me antral processing unit 14 In the near term are transferred 

20 from the main memory unit 13 (or a user unit 15) through asystem interface unit 11;to a cache memory 
unit 12. (those skilled in the art will understand that, in some data processing system architectures, the: 
signal groups are transferred over a system bus, thereby requiring an interface unit for each component 
interacting with the system bus.) The signal groups are stored in the cache memory unit 12 until requested 
by the central processing unit 14. To retrieve the correct signal group, address translation apparatus 16 is 

25 typipally incorporated to convert a virtual address (used by the central processing unit 14 to identify the 
signal group to be fetched) to the real address used for that signal group by the remainder of the data 
processing system to identify the signal group. 

The information stored transiently in the cache memory unit 14: may include both instructions and 
operands stored In separate sections or stored homoigeneously. Preferably, in the practice of the: present 

30 invention, instructions and operands :are stored in separate (at least in the sense that they do not have 
commingled addresses) memory sections, In the cache memory unit 14 inasmuch as it is intended to invoke 
the operation of the present invention; as;to operand information only. 

The present invention is based on recognizing arid taking advantage of sensed patterns in cache 
mi^es resulting from operand calls. In an extremely elementary example, cdrisider a sensed pattern in 

as which three consecutive misses ABC are, in fact, successive operand addresses with D being; the next 
successive address. This might take place, merely by way of example, in a data manipulation process 
calling for successive I y accessing successive rows in a single column of data. If this pattern is sensed, the 
likelihood that signal group D will also be accessed, and soon, is/enhanced such that its prefetching into the 
cache memory unit 14 is in order. 

<o The fundamental principles of the invention are set forth In the operational flow chart of FIG. 2. When a 
processor (or other system unit) asks for an operand, a determination is made as to whether or not the 
operand is currently resident in the cache. If so, there is a cache hit (i.e.; no cache miss), the operand is 
sent to the requesting system unit and the next operand request is awaited. However, if there is an cache 
miss, the request is, ineffect. redirected to the (much slower) main memory. 
& Those skilled Irtthe art will understand that the description to this point of FIG. 2 describes cache 
memory operation generally. In the present invention, however, the. address of the cache miss is 
meaningful. It is therefore placed at the topi of a miss stack to be described ihjurther detail below. The mis;s 
stack (which contains a history of the addresses of recent cache misses in consecutive order) is then 
examined to determine if a first of several patterns is present. Tc\\z first pattern might be, merely by way of 
so example, contiguous addresses for the recent cache misses. If the first pattern is not sensed, additional 
patterns are tried. Merely by way of example again, a second pattern might be recent cache misses calling 
for successive addresses, situated two locations apart. So long as there is no pattern match, the process 
continues through the pattern repertoire. If: there is no match when all pattemsun the repertoire have been 
exarhihed, the next cache miss is awaited to institute the process anew. 
55 Howev r, if a pattern in the repertoire is detected, a predictiv address is calculated from the 
ihforrtiatibh in the miss; stack and from the sensed pattern. This pr dictiye address is then employed to 
prefetch from main memory into cache th signal group identified by the predictive address. In the 
elementary example previously given, if a pattern is= sensed in which consecutive operand cache miss 
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operand addresses ABC are consecutive and contiguous, the value of the predictive address; D. will be C 
+ i. 

In order to optimize the statistical integrity of the miss stack; the predictive address itself may. be 
placed at the top of the/stack since it would (highly probably) itself havebeeh the;subject of a.cache miss if 
5 it had not been prefetched in accordance with the invention. 

Since speed of operation is essentia!., the invention may advantageously be embodied in a hard wired 
form (e.g., in a gate array) although firmware control is contemplated. Consider first a relatively simple 
hardwired implementation show in FIG. 3; A miss stack 20 holds the; sixteen most recent cache, miss 
addresses, the oldest being identified as address P with entry onto the stack; being made at the top. Four 

io four-Input electronic switches 21, 22, 23, 24 are driven in concert by a shift pattern signal via line 25 such 
that lira first state, addresses A, B. G. D appear at to respective outputs ofitrie-switches; in a second state, 
addresses B.-D, F, H appear at the outputs; in a third state, addresses Ci F, I, L appear at the outputs; and 
in a fourth state, addresses D, H, UP appear at the outputs. Subtraction I drcuits-26, 27, 28 are connected 
to receive as inputs the respective outputs of the electronic switches 21; 22. 23. 24 such that the output ' 

is from the subtraction circuit 26 is; the output- of the switch 21 minus the output of the switch 22: the: output 
from the subtraction circuit 27 is the output of the switch 22 minus the output of the switch 23: and the 
output from the subtraction circuit 28 is the output of the switch 23 minus the dutputof the switch 24. 

The output; from the subtraction circuit r 26 is applied: to; one input of an adder circuit 31 which has its 
other input; driven by the output of the. electronic switch 21. In addition, the output from the subtraction 

20 circuit 26 is also applied to one input of a, comparator circuit 29. The. output from the subtraction circuit 27 
is applied to the other input of the comparator circuit 29 and also to one input of another comparator circuit 
30 which has its other input driven by the output of the subtraction circuit 28. The outputs from .the 
comparator circuits 29. 30 are applied, respectively, to the two inputs of an AN Dogate 32 which selectively 
issues a prefetch enable signal. 

25 Consider now the operation of the circuit shown in FIG. 3. As previously noted, miss;stack 20 'holds the 
last; sixteen cache miss addresses, address A being the most recent. When the request for the signal group 
identified by address A results in a cache miss, circuit operation is instituted to search for a pattern among 
the addresses resident in the miss stack. The electronic switches 21 , 22, 23, 2A are at their first state such 
that address A is passed through to the output of switch 21, address B appears at the output of switch 22, 

30 address C appears at theT output of switch 23 and address D appears atlthe output of switch 24. If the 
differences between A: and B, B and C, and G and D are not all equal; not all the outputs fram the 
subtraction circuits 26, 27. 28 will be equal such that one or both the comparator circuits 29, 30 will issue a 
no compare; and AND-gate 32 will not be enabled, thus indicating a "no pattern match found" condition. 
The switches are then advanced to their second state in which addresses B, D, F, H appear at their 

3s t respective outputs. Assume how that (B r O) - (D - F) = (F - H); i.e., a sequential pattern has been sensed 
in the acldfess displacements. Consequently, both the comparators 29, 30 will issue compare signals to fuliy 
enable the AND-gate 32 and produce a prefetch enable signal. Simultaneously, the output frorri the adder 
circuit 31 will be ;the predictive address (B + (B -D)). It will be seen that this! predictive address extends; the 
sensed pattern and thus increases ^ the prpbab|lity that the prefetched signal igroup will be Requested by the 

40 processor, thereby lowering the cache miss ratio. 

if a pattern had not have been sensed in the address combination BDFH. the electronic switches; would 
have been advanced to their next state to examine the address combination CFIL and then on to the^ 
address combination DHLF if necessary. If no pattern was sensed^ the circuit would await the next cache; 
miss; which will place a new entry at the top of the miss stack and push address P out. the bottom of the, 

45 stack before the pattern match search is agajn instituted; 

Consider now the somewhat more complex and powerful embr^iment of the invention illustrated in FIG. 
4. Electronic switches 41, 42, ^3, 44 receive at their respective inputs; recent cache miss; addresses as 
stored in the miss stack 40 in the exemplary arrangement .shown. It will be noted that each of the electronic 
switches 41. 42, 43. 44 has eight inputs, which can be sequentially selectively transferred to the single 

so outputs under the influence of the shift pattern signal. It; will also, be rioted that the miss stack 40 stores, Jh 
addition to the sixteen latest cache miss addresses A - P, three future entries WXY. Subtraction circuits .45, 
46; 47 perform the same office as the corresponding subtraction circuits r26. 27. 28 of the; FIG. 3 
embodiment previously described; Similarly, adder circuit 48 corresponds toUhe radder circuit 31 previously 
described; 

*ss Comparator circuit 49 receives the respective outputs of the^sublraction circuits 45. 46.. and jts.output is 
applied to one input of' an AND-gate 38 which selectively issues . the prefetch enable signal. Comparator 
circuit 50 receives the; respective outputs of the subtraction circuits 46, 47, bui, unlike its; counterpart 
comparator 30 of the FIG. 3 embodiment, its output is applied to one input of an OR-gate. 39 which has its 

4 
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other input driven by a reduce lookahead signal. The output of OR-gafe 39 Is coupled to the other Input of 
AND-gate 38. With this arrangement, activation of the reduce lookahead signal nables ORrGate 39 and; 
partially enables AND-gate 38. The effect of applying the reduce lookahead signal Is to compare only the 
outputs of the subtraction circuits 45, 46 in the comparator drcuit 49 such that a compare fully enables the. 

5 AND-gate 38 to issue, the prefetch enable signal. This mode of operation may be usefu^fbr example, when 
the patterns seem to be changing every few cache misses, and ft favors the most recent examples. 

with the arrangement of FiG. 4; It Is ;^vantagepus to try all the patterns within pattern, groups (as 
represented by the "YES" response to the >1 PATTERN GROUP?" query in the flow diagram of RQ, 2} 
even H there is a pattern match detected intermediate the process. Thisfollqws from the fact that more than 

10 one/of the future entries WW to the ml$s;stack ; may be developed during a single pass through the pattern 
repertoire or even a subset of the pattern repertoire. With the specific implementation of RG. 4 (which is 
only exemplary of nriahy possible useful configurations), the following results, are obtainable; 
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The goal states are searched in groups by switch; state; I j$: Group 1 includes switch states 0, 1, 2 and 
could result In filling future entries WOT; Giroup 2 Includes states 3, 4 and could result in filling entries WX; 
Group 3 includes states 5, 6 and ^ in filftrig entries WX; arid Group 4 Includes state 7 and 

could result in filling entry When a ged state Is reached that has been predicted, the search Is halted for 
the current cache miss; to., It would not be de^rable to- ra^lac^li 'an" cdrM&l^ ^ciewdlo'g^ecl^prBActim .-aUctross--W 
with a different predictive address W. 

Those skilled in the art win understand mat the logic circuitry of FIGs. 3 arid 4 Is sdmwhat simplified 
since multiple binary digit Information Is panted as ff ft were single binary digit Information. Thus, In 
practice, arrays of electronic switches^ gates* etc. will actually be employed to handle the, added dimension 
as may be necessary and entirely conventionally. Further, timing signals! and logic for incorporating the 
Inventive structure Info a given data processing system environment will be those appropriate for that 
environment and will be the subject of straightforward logic, design. 

Thus, while the principles of the invention have now been made clear in OT niustr*h*ye embodiment 
there will be Immediately obvious to thdsd sldlled In the art many modifications of ; structure, arrangements, 
proportions, the elements, materials, and components, used in the practice of the Invention which are 
particularly adapted for specific environments and operating requirements without departing from those 
principles; 

Claims 

1. In a data processing system incorporating a cache memory, a method for predicting cache miss 
addresses comprising the- steps of: 

A) establishing a miss stack for storihg'a pluranty df^cache miss addresses; 

B) waiting for a cache miss; 

G) when a cache miss occurs, placing the address of the called information onto the top of th miss 

stack; 

D) examining the miss stack for an address pattern am ng the resident cache miss addresses; 

E) if a pattern is; not sensed, returning to step B); and 

F) if a pattern is sensed: 

1) using the. sensed pattern and at least one of the addresses in. th miss stack to calculate a 
predictive address: 
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2) prefetching into cache memory me signal group identified by the predictive address- and 

3) returning to step B). 

2 T7ie;method of Claim 1 in which, during step D), a, repertoire of predetermined address- patterns are 
searchable, and the examination continues from pattern to pattern until a first pattern match is sensed 

3. In a data processing system incorporating a cache, memory, a methbd for predicting cache miss 
addresses comprising the steps of: 

A) establishing a miss stack for storing a plurality of cache miss addresses; 

B) waiting for a cache miss; 

C) when a cache missoccurs. placing the address of the called information onto the top of the miss 

D) examining the cache miss addresses resident in the miss stack* for a match with a selected 
address pattern in a current group of a plurality of groups of patterns; 

E) M lhe Elected pattern is not sensed, determining if all the; patterns in the current group have been 

examined; ! - 

F) if all the patterns in the current group have not been examined, selecting another pattern in the 
current group and returning to step D); • 

G, - ra " ® e P att ° m * in 3,1 *• groups In the- pattern repertoire have beehrsearched; returning to step 

H) if all the patterns in the current group have been examined, assigning another group as the current 
group, selecting a pattern from the new current group and returning to step D); and 

I) if the selected pattern is sensed: 

1) using the sensed pattern and at least one of the addresses in the. miss stack to calculate a 
predictive address; — "' 

2) prefetching into cache memory the signal group identified by the predictive address; and 

3) assigning another group as the cu^ 

4. Tfe method of Claim 3 in which, intermediate subsleps l)1) .and l)3). there is performed substep \)2y 
a) in which the predictive address is placed onto the miss stack. ' 

5. Apparatus for developing a predictive address for prefetching information into a cache memory 
compnsing: ■ • - ' 

A) a first in, first out stack for storing a plurality of addresses representing cache misses- 

.3 a j" Urality ° f eJ^Pnic .switch means each having a plurality of address inputs and a single 
address output; y 

lno.H.2" ""!!?? .~ u P«n9_5 aid "X™** in «»W stack individually t6,said,electronic switch means 
inputs in predetermined orders; 

D) means for switching said electronic switch means to transfer said addresses applied to said 
•electronic swrtch means inputs to said electronic switch outputs to establish at said electronic switch 
outputs predetermined combinations of said addresses; 

E) at least two subtraction circuit means' each coupled to receive a pair of different addresses from 
said electronic switch means outputs and to Issue a value representing the displacement therebetween; 
n a ,v ni e 5 • *S COmpar * or drcu,t mdafts coMPled tp receive a pair of outputs from a corresponding 
pair of said subtracts circuit means and responsive thereto for issuing an iprefetch enable logic signal if 
there is a compare condition: and 

G) predictive address development means adapted to combine one of said addresses appearing at 
one. of said electronic switch outputs and displacement information appearing at one of said subtraction 

45 circuit means to obtain a predictive address: 

whereby, the.cpordinated presence of said" predictive address and said prefetch enable logic signal causes 
a signal group identified by said predictive address to be prefetched into saidjeache memory 
• 6 Thp apparatus of Claim .5 which includes at least three of said subtraction circuit means: and af least " 
two oi said comparator circuit means and which further comprises: 

r,o A) AND^gate means having separate inputs respectively receiving outputs coupled from said at least two 
comparator circuit means, said AND-gate selectively issuing .said prefetch enable logic signal only when 
fully enabled. 

7. The apparatus of Claim 6 which further includes: 

A) OR-gate means driving at least one input to said ANEbgate means, said OR-gate means* having input* 
55 receiving: * K ' 

1. outputs coupled from at least one of said comparator circuits; and 

2. a selectively applied reduce lookahead logic signal; 

whereby, application of said reduce lookahead: signal to said. OR-gate means partially enables said AND- 



30 



40 



6 



BEST AVAILABLE COPY 



EP 0402 787 A2 



gate means driven thereby and thus eliminates said at least one of said comparator circuits from 
consideration in the issuance of said prefetch enable logic signal. 
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